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Abstract 

Form a random k-SAT formula on n variables by select- 
ing uniformly and independently m — rn clauses out of 
all 2 fc (") possible k-clauses. The Satisfiability Threshold 
Conjecture asserts that for each k there exists a constant r k 
such that, as n tends to infinity, the probability that the for- 
mula is satisfiable tends to 1 if r < r k and to Oifr > r&. It 
has long been known that 2 k /k < r k < 2 k . We prove that 
r k > 2 k ~ 1 In 2 - d k , where d k -> (1 + ln2)/2. Our proof 
also allows a blurry glimpse of the "geometry" of the set of 
satisfying truth assignments. 



1. Introduction 

Satisfiability has received a great deal of study as the 
canonical NP-complete problem. In the last twenty years 
some of this work has been devoted to the study of randomly 
generated formulas and the performance of satisfiability al- 
gorithms on them. Among the many proposed distributions 
for generating satisfiability instances, random k-S AT has re- 
ceived the lion's share of attention. 

For some canonical set V of n Boolean variables, let 
Cfc = Cfc(V) denote the set of all 2 fe (") possible dis- 
junctions of k distinct, non-complementary literals from V 
(fc-clauses). A random fc-SAT formula F k (n, m) is formed 
by selecting uniformly, independently, and with replace- 
ment m clauses from C k and taking their conjunction 1 . We 
will be interested in random formulas as n grows. In partic- 
ular, we will say that a sequence of random events £ n occurs 
with high probability (w.h.p.) if lim rwoo Pr[£ n ] = 1. 

There are at least two reasons for the popularity of ran- 
dom fc-SAT. The first reason is that while random fc-SAT in- 



stances are trivial to generate they appear very hard to solve, 
at least for some values of the distribution parameters. The 
second reason is that the underlying formulas appear to en- 
joy a number of intriguing mathematical properties, includ- 
ing 0-1 laws and a form of expansion. 

The mathematical investigation of random fc-SAT began 
with the work of Franco and Paull [ pj| ] . Among other re- 
sults, they observed that F k (n,m = rn) is w.h.p. unsat- 
isfiable if r > 2 k In 2. To see this, fix any truth assign- 
ment and observe that a random fc-clause is satisfied by it 
with probability 1 — 2~ fe . Therefore, the expected num- 
ber of satisfying truth assignments of F k (n,m = rn) is 
[2(1 - 2- k ) r ] n = o(l) for r > 2 k In 2. Shortly afterwards, 
Chao and Franco [||] complemented this result by proving 
that for all fc > 3, if r < 2 k /k then the following linear- 
time algorithm, called Unit Clause (uc), finds a satisfy- 
ing truth assignment with probability at least e = e(r) > 0: 

If there exist unit clauses, pick one randomly and satisfy it; 
else pick a random unset variable and set it to 0. 

A seminal result in the area was established a few years 
later by Chvatal and Szemeredi |j^]. Extending the work 
of Haken [|l8[ and Urquhardt [ p7| ] they proved the follow- 
ing: for all k > 3, if r > 2 fe ln2, then w.h.p. F k (n,rn) 
is unsatisfiable and every resolution proof of its unsatis- 
fiability must contain at least (1 + e)" clauses, for some 
e = e(k, r) > 0. 

Random fc-SAT owes a lot of its popularity to the exper- 
imental work of Selman, Mitchell and Levesque [|4j who 
considered the performance of a number of practical algo- 
rithms on random 3-SAT instances. Across different algo- 
rithms, their experiments consistently drew the following 
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1 In fact, our discussion and results hold in all common models for ran- 
dom fc-SAT, e.g. when clause replacement is not allowed and/or when each 
fc-clause is formed by selecting fc literals uniformly at random with re- 
placement. 



picture: for r < 4, a satisfying truth assignment can be 
found easily for almost all formulas; for r > 4.5, almost 
all formulas are unsatisfiable; and for r ~ 4.2, a satisfying 
truth assignment can be found for roughly half the formu- 
las, while the observed computational effort is maximized. 
The following conjecture, formulated independently by a 
number of researchers, captures the suggested 0-1 law: 

Satisfiability Threshold Conjecture For each k > 2, there 
exists a constant such that 



lim Pi[Fk(n, rn) is satisfiable] 



1 if r < ru 
if r > rk 



The conjecture was settled early on for the linear-time 
solvable case k = 2: independently, Chvatal and Reed [^J, 
Fernandez de la Vega [Q, and Goerdt [ 17 1 proved r-x = 1. 
For k > 3, neither the value nor the existence of have 
been established. Friedgut [p4[|, though, has proved the ex- 
istence of a critical sequence rk (n) around which the proba- 
bility of satisfiability goes from 1 to 0. In the following, we 
will take the liberty of writing > r* if Fk(n, rn) is sat- 
isfiable w.h.p. for all r < r* (and analogously for r k < r*), 

Chvatal and Reed [^]], besides proving r2 = 1, gave 
the first lower bound for r^, strengthening the positive- 
probability result of In particular, they considered a 
generalization of UC, called SC, which in the absence of unit 
clauses satisfies a random literal in a random 2-clause (and 
in the absence of 2-clauses satisfies a uniformly random lit- 
eral). They proved that for all k > 3, if r < (3/8)2 fc /fc then 
SC finds a satisfying truth assignment w.h.p. 

In the last ten years, the satisfiability threshold conjec- 
ture has received attention in theoretical computer science, 
mathematics and, more recently, statistical physics. A large 
fraction of this attention has been devoted to the first com- 
putationally non-trivial case, k = 3 and a long series of 
results % § £| H | flj Q H |H H Q [if] has 
narrowed the potential range of r-j. Currently this is pinned 
between 3.42 by Kaporis, Kirousis and Lalas [ pi] ] and 4.506 
by Dubois and Boufkhad rfuj]. All upper bounds for r% 
come from probabilistic counting arguments, refining the 
idea of counting the expected number of satisfying truth as- 
signments. All lower bounds on the other hand have been 
algorithmic, the refinement lying in considering progres- 
sively more sophisticated algorithms. 

Unfortunately, for general fc, neither of these two ap- 
proaches above has helped narrow the asymptotic gap be- 
tween the upper and lower bounds for r k . The known tech- 
niques improve upon < 2 k In 2 by a small additive con- 
stant, while the best lower bound, comes from Frieze and 
Suen's [[l6| analysis of a full generalization of UC: 

Satisfy a random literal in a random shortest clause. 

This gives r k > Cfc2 fe /fc where lini/^oc Ck = 1.817 . . . 



If one chooses to live unencumbered by the burden of 
mathematical proof, then a powerful non-rigorous tech- 
nique of statistical physics known as the "replica trick" is 
available. So far, predictions based on the replica trick have 
exhibited a strong (but not perfect) correlation with the (em- 
pirically observed) truth. Using this technique, Monasson 
and Zecchina [[B]] predicted rk — 2 fe In 2. Like most argu- 
ments based on the replica trick, their argument is mathe- 
matically sophisticated but far from being rigorous. 

If one indeed believes that the correct answer lies closer 
to the upper bound (for whatever reason) then analyzing 
more sophisticated satisfiability algorithms is an available 
option. Unfortunately, after a few steps down this path 
one is usually forced to choose between rather naive algo- 
rithms, which can be analyzed, or more sophisticated algo- 
rithms that might get closer to the threshold, but are much 
harder to analyze. In particular, the lack of progress over 
c2 k /k < rk < 2 fc In 2 in the last ten years suggests the pos- 
sibility that no (naive) algorithm can significantly improve 
the lower bound. At the same time, it is clear that proving 
lower bounds by analyzing algorithms is doing "more than 
we need": we not only get a proof that a satisfying assign- 
ment exists but an explicit procedure for finding one. 

In this paper, we eliminate the asymptotic gap for 
by using the "second moment" method. Employing such 
a non-constructive argument allows us to overcome the lim- 
itations of current algorithmic techniques or, at least, of our 
capacity to analyze them. At the same time, not pursuing 
some particular satisfying truth assignment affords us a first, 
blurry glimpse of the "geometry" of the set of satisfying 
truth assignments. Our main result is the following. 



Theorem 1 For all k > 2, r k > 2 k ~ 

d k (l + ln2)/2. 



In 2 — dk, where 



As we will see shortly, a straightforward application of 
the second moment method to random fc-SAT fails rather 
dramatically: if X denotes the number of satisfying truth 
assignments, then F,[X 2 ] > (1 + e) n E[X] 2 for any r > 0. 
To prove Theorem [l] it will be crucial to focus on those sat- 
isfying truth assignments whose complement is also satisfy- 
ing. Observe that this is equivalent to interpreting Fk(n, m) 
as an instance of Not All Equal (NAE) fc-SAT, where a truth 
assignment is NAE-satisfying if every clause contains at 
least one satisfied literal and at least one unsatisfied literal. 

Analogously to random fc-SAT, it is trivial to show that 
if r > 2 fe_1 In 2 - (ln2)/2 then w.h.p. F k (n,m = rn) has 
no NAE-satisfying truth assignments since their expected 
number is o(l). We match this within an additive constant. 

Theorem 2 There exists a sequence tk — ► 1/2 such that if 

r < 2 fc " 1 ln2- (ln2)/2-i fc 
then w.h.p. Fk(n,rn) is NAE-satisfiable. 



Theorem [j] follows trivially from Theorem || since any 
NAE-satisfying assignment is also a satisfying assignment. 
Our method actually yields an explicit lower bound for 
the random NAE fc-SAT threshold for each value of fc as 
the solution to a transcendental equation (yet one with- 
out an attractive closed form, hence Theorem fy. It 
is, perhaps, worth comparing our lower bound for the 
NAE fc-SAT threshold with the upper bound derived us- 
ing the technique of [23| for small values of k. Even 
for k = 3, our lower bound is competitive with the 
best known lower bound of 1.514, obtained by analyz- 
ing a generalization of UC that minimizes the number of 
unit clauses [g0. For larger k, the gap between the upper 
and the lower bound rapidly converges to « 1/4. 



k 


3 


5 


7 


10 


12 


Lower 
Upper 


3/2 
2.214 


9.973 
10.505 


43.432 
43.768 


354.027 
354.295 


1418.712 
1418.969 



Table 1 . Bounds for the random NAE k-S AT threshold. 

Recently, and independently of our work, Frieze and 
Wormald jl5| ] showed that another way to successfully ap- 
ply the second moment to random fc-SAT is to let k grow 
with n. In particular, let u) = k — log 2 n — > oo, let mo = 
~ in(i- n 2 2 fe) = (2 fe + 0(l))nln2andlete = e(n) > be 
such that en — > oo. Then, Fk(n,m) is w.h.p. satisfiable if 
m < (1 — e)mo but w.h.p. unsatisfiable if m > (1 + e)mo- 

We prove Theorem || by applying the following version 
of the second moment method (see Exercise 3.6 in [f26|l). 

Lemma 1 For any non-negative random variable X, 

E[X] 2 



Pr[X > 0] > 



E[X 2 



(1) 



In particular, let X > be the number of NAE-satisfying 
assignments of Fk(n, m = rn). We will prove that for all 
e > and all k > fc (e), if r < 2 



fe-i ■ 



l n2 -is2 - i - 



then there exists some constant C = C{k) such that 

e[a: 2 ] < c x e[X] 2 . 

By Lemma [j], this implies 

Pr[X > 0] > Pr[F k (n, rn) is NAE-satisfiable] > 1/C . 

To get Theorems [j] and || we boost this positive probabil- 
ity to 1 — o(l) by employing the following corollary of 
the aforementioned non-uniform threshold for random fc- 
SAT Q (and its analogue for random NAE fc-SAT): 

Corollary 1 7/'liminf„^ 00 Pr[.Ffc(n, r*n) is satisfiable] > 
0, then Fk{n 1 rn) is satisfiable w.h.p. for r < r*. 



In the next section we give some intuition on why the 
second moment method fails when X is the number of sat- 
isfying truth assignments, and how letting X be the num- 
ber of NAE-satisfying assignments rectifies the problem. In 
Section |] we give some related general observations and 
point out potential connections to statistical physics. We 
lay the groundwork for bounding E[X 2 ] in Section [|. The 
actual bounding happens in Section |5| We conclude with 
some discussion in Section || 

2. The second moment method 
2.1. Random fc-SAT 

Let X denote the number of satisfying assignments of 
Fk(n, m). Since X is the sum of 2™ indicator random vari- 
ables, linearity of expectation implies that to bound E[X 2 ] 
we can consider all 4™ ordered pairs of truth assignments 
and bound the probability that both assignments in each pair 
are satisfying. It is easy to see that, by symmetry, for any 
pair of truth assignments s, t this probability depends only 
on the number of variables assigned the same value by s 
and t, i.e., their overlap. Thus, we can write E[X 2 ] as a 
sum with n + 1 terms, one for each possible value of the 
overlap z, the zth such term being: 2" (counting over s) 
x an "entropic" ("J factor (counting overlap locations) x 
a "correlation" factor measuring the probability that truth 
assignments s, t having overlap z are both satisfying. 

Now, as we saw earlier, E[X] = [2(1 - 2~ k ) r ] n = c". 
Thus, if r is such that c < 1, then Pr[X > 0] < E[X] = 
o(l) and we readily know that Fk(n, rn) is w.h.p. unsatisfi- 
able. (Note that Pr[X > 0] = o(l) even when c = 1 since 
the naive upper bound is not tight.) Therefore, we are only 
interested in the case where E[X 2 ] > E[X] 2 = (1 + e) n 
for some e = e(r) > 0. Since the sum defining E[X 2 ] 
has only n + 1 terms we see that, up to polynomial factors, 
E[X 2 ] is equal to the contribution of the term maximizing 
the "entropy-correlation" product. 

Observe, now, that if z = n/2, then the probability that 
s and t are both satisfying is the square of the probability 
that one of them is. To see this take s to be, say, the all 
0s assignment and consider the set of clauses this precludes 
from being in the formula. Thus, for truth assignments that 
overlap on n/2 bits, the events of being satisfying are in- 
dependent. Therefore, up to polynomial factors, E[X] 2 is 
equal to the z = n/2 term of the sum defining E[X 2 ]. 

From the above discussion, letting a = z/n,we see that 
if the entropy-correlation factor is maximized at some a ^ 
1/2 then the second moment method fails. On other other 
hand, as we will see, if the maximum does indeed occur at 
a = 1/2, then the polynomial factors cancel out and the 
ratio E[X 2 ]/E[X] 2 is bounded by a constant independent 
of n, implying that in that case Pr[X > 0] > 1/C. 



0.2 0.4 0.6 0.8 1 

a 

fc = 5, r = 16, 18, 20, 22, 24 (top to bottom) 

Figure 1 . The nth root of 2 x the entropy-correlation prod- 
uct for k-SAT as a function of the overlap a = z/n. 

With these observations in mind, in Fig. 1 we plot the 
nth root of each of the n + 1 terms contributing to E[X 2 ] as 
a function of a = z/n for k = 5 and different values of r. 

Unfortunately, we see that for all values of r considered 
the maximum lies to the right of a = 1/2. The reason 
for this is that the correlation factor for fc-SAT is strictly 
increasing with a = z jn. For instance, as we saw above, if 
s is satisfying and t has an overlap of z — n/2 with s, then 
the conditional probability that t is also satisfying equals 
its a priori value (1 — l/2 k ) m . But if z decreases, say, to 
then the conditional probability that t is satisfying decreases 
to (1 — l/(2 fe — l)) m , penalizing t = s exponentially and 
making it the least likely assignment to be satisfying. 

This asymmetry in the correlation factor implies that for 
all r > its product with the (symmetric) entropy factor is 
maximized at some a > 1/2. Therefore, E[AT 2 ] is greater 
than E[AT] 2 by an exponential factor for all r > 0, and 
Lemma [l] fails to give any non-trivial lower bound. To have 
any hope of getting a lower bound by the second method we 
need to consider a set of satisfying assignments for which 
the derivative of the correlation factor at 1 /2 is zero. 

2.2. Random NAE fc-SAT 

One attractive feature of the second moment method is 
that we are free to apply it to any random variable X such 
that X > implies that Fj~{n, m) is satisfiable. In partic- 
ular, we can refine our earlier application of the method by 
focusing on any subset of the set of satisfying assignments. 

Considering only assignments that are NAE-satisfying 
— or, equivalently, whose complement is also satisfying — 
makes the correlation factor symmetric around a = 1/2 as 
twin satisfying assignments s and s provide an equal "tug" 
to every other truth assignment t. As a result, we always 
have a local extremum at a = 1/2 since both the correlation 
factor and the entropy are symmetric around it. Moreover, 
since the entropic term is independent for r, this extremum 
is a local maximum for sufficiently small r. Whenever this 
is also the global maximum, the second moment succeeds. 
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k = 5, r = 8, 9, 10, 11, 12 (top to bottom) 
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fc = 5, r = 9.973 

Figure 2. The nth root of 2xthe entropy-correlation prod- 
uct for NAE k-SAT as a function of the overlap a = z/n. 

In Fig. 2 we plot the nth root of the entropy-correlation 
product for NAE fc-SAT for various values of r. Let us start 
with the top picture, where fc = 5 and r increases from 8 to 
12 as we go from top to bottom. For r = 8, 9 we see that, 
indeed, the global maximum occurs at a = 1/2. As result, 
for such r we have E[X 2 ] = 6(ELY] 2 ), implying that the 
formula is NAE-satisfiable with positive probability. 

For the cases r = 11, 12, on the other hand, we see that 
at a = 1/2 the function has dropped below 1 and there- 
fore E[AT] 2 = o(l), implying that w.h.p. F^(n, rn) has no 
NAE-satisfying truth assignment. It is worth noting that for 
r = 11 we have Pr{X > 0] = o(l), even though E[X 2 ] is 
exponentially large, due to the maxima close to and 1. 

The most interesting case is r = 10 where a = 1/2 is 
a local maximum (and greater than 1) but the global max- 
ima occur at 0.08, 0.92 where the function equals 1.0145... 
(vs. 1.0023... at a = 1/2). Because of this, we have 
E[X 2 ]/E[X] 2 > (1.0144/1.0024)", implying that the sec- 
ond moment method only gives an exponentially small 
lower bound on Pr[X > 0] in spite of the fact that the ex- 
pected number of NAE-satisfying truth assignments is ex- 
ponential. Note, also, that according to Table [j] the best 
known upper bound for fc = 5 is 10.505 > 10. 

Indeed, the largest value for which the second moment 
succeeds for fc = 5 is r = 9.973... This is depicted in the 
bottom picture where the three peaks have the same height. 
For r > 9.973 the peaks near and 1 surpass the one at 
a = 1/2 and the second moment method fails. 



3. Intuition 

3.1. Reducing the variance 

Given two truth assignments s, t that have overlap z let 

k ^ , . Pr[t is satisfying | s is satisfying] 
Pr[t is satisfying] 

It is not hard to see that 

To examine one particular source contributing to boost(z) 
in the case of random fc-SAT, it is helpful to introduce the 
following quantity: given a truth assignment s and a for- 
mula F let Q = Q(s,F) be the total number of literal oc- 
currences in F that are satisfied by s. Thus, Q(s, F) is max- 
imized when s assigns each variable its "majority" value. 

It is well-known that, with respect to properties that hold 
w.h.p., Fk(n,m = rn) is equivalent to a random formula 
generated as follows: first, for each literal I, generate Rg 
literal occurrences, where the {Re} are i.i.d. Poisson ran- 
dom variables with mean fcr/2; then, partition these literal 
occurrences randomly into m parts of size k. 

In this model we can easily factor Pr[s is satisfying] 
as J2 q Pr[Q — q] x Pr[s is satisfying \Q = q]. Clearly, the 
probability of Q deviating significantly from its expected 
value km/2 is exponentially small. At the same time, 
though, any such increase in Q affords s tremendous advan- 
tage in terms of its likelihood to be satisfying. Moreover, 
since w.h.p. each variable appears in O(logn) clauses, this 
advantage will be very much shared with the truth assign- 
ments having large overlap with s, thus contributing heavily 
to the boost function and, as a result, to E[X 2 ]. 

On the other hand, if we consider the probability that s is 
NAE-satisfying it is clear that s would like Q to be as close 
as possible to km/2. In other words, now the typical case is 
the most favorable case and the clustering around truth as- 
signments that satisfy many literal occurrences disappears. 

Whether this is the main reason for which the second mo- 
ment method succeeds for random NAE fc-SAT remains an 
interesting question. Considering regular random fc-SAT, 
where all literals are required to appear an equal number of 
times, seems like an interesting test of this hypothesis. 

3.2. Geometry and connections to statistical physics 

Statistical physicists have developed a number of meth- 
ods for investigating phase transitions which, while non- 
rigorous, are often in spectacular agreement with numeri- 
cal and experimental results. One of these methods is the 
replica trick. The term "replica" comes from the fact that 



when q is an integer one can compute EfX" 7 ] by consider- 
ing the interactions between q elementary objects, or "repli- 
cas", counted by X. In our case, we consider two truth as- 
signments when calculating the second moment. 

At a high level, the replica trick amounts to comput- 
ing E[lnX] by calculating E[AT 9 ] for all integer q and 
then plugging in the resulting formula to the expression 
E[lnX] = lim 9 _>o (E[X«] - l)/q. The fundamental leap 
of faith, of course, lies in allowing the analytic continua- 
tion q — > from integer values of q. Even to get this far, 
however, one has to deal with the often daunting task of 
computing E[X 9 ] for all integer q. 

When X counts objects expressed as binary strings, such 
as satisfying assignments, to calculate E[X 9 ] one must in 
general maximize a function of 2 q — 1 "overlaps", each 
overlap counting the number of variables assigned a given 
q-vector of 0/1 values by the q assignments/replicas. (Note 
that in random [NAE] fc-SAT, since variables are negated 
randomly in each clause, we can take one of the q assign- 
ments to be the all 0s, so we only have 2 9_1 — 1 overlaps.) 

By taking another leap of faith, one can dramatically 
reduce the dimensionality of this maximization problem 
to q by assuming replica symmetry, i.e., that the global 
maximum is symmetric under permutations of the repli- 
cas. For satisfiability problems this means that all overlap 
variables with the same number of Is in their respective q- 
vector take the same value. While this assumption is often 
wrong, it can lead to good approximations. In particular, 
replica symmetry was assumed in the work of Monasson 
and Zecchina [ p5| ] predicting — 2 k In 2. 

A standard indicator of the plausibility of replica sym- 
metry in a given system is the (usually experimentally mea- 
sured) distribution of overlaps between randomly chosen 
ground states, in our case satisfying assignments. If replica 
symmetry holds, this distribution is tightly peaked around 
its mean; if not, i.e., if "replica symmetry breaking" takes 
place, this distribution typically gains multiple peaks or be- 
comes continuous in some open interval. 

Intriguingly, the second-moment method is essentially a 
calculation of the overlap distribution in the annealed ap- 
proximation, i.e., after we average over random formulas 
(giving formulas with more satisfying assignments a heav- 
ier influence in the overlap distribution). For random NAE 
fc-SAT we saw that, almost all the way to the threshold, 
the overlap distribution is sharply concentrated around n/2, 
since when we take nth powers the contribution of all other 
terms vanishes. 

In other words, we have shown that in the annealed 
approximation, the overlap distribution behaves as if the 
NAE-satisfying assignments were scattered independently 
throughout the hypercube. 



4. Groundwork 

Let X be the number of NAE-satisfying assignments of 
Fk(n,m = rn). We start by calculating E[X]. For any 
given assignment s, the probability that a random clause is 
satisfied by s is the probability that its k literals are neither 
all true nor all false. We call this probability p = 1 — 2 1 ~ k . 

Since clauses are drawn independently with replacement 
and we have m — rn clauses, we see that 



E[X] = (2p r 



(2) 



To calculate E[X 2 ] we first observe that, by linearity of 
expectation, it is equal to the expected number of ordered 
pairs of truth assignments s,i such that both s and t are 
NAE-satisfying. We claim that the probability that a pair of 
truth assignments s, t are both NAE-satisfying depends only 
on the number of variables to which they assign the same 
value (their overlap). In particular, we claim that if s and 
t have overlap z = an, where < a < 1, then a random 
/c-clause c is satisfied by both s and t with probability 

i-fc + 2 1 ~ k (a k + (1 -a) k ) (3) 



/(«) 



1 



2 ■ 2 



= 1-2 



i-fe 



2-a k -(l-a) k ) 



To see this, first recall that the probability of a clause c not 
being satisfied by s is 1 — p = 2 1_fc . Moreover, if c is not 
satisfied by s, then in order for c to also not be satisfied by 
t, it must be that either all the variables in c have the same 
value in t and s, or they all have opposite values. Since s 
and t have an overlap of z = an variables and the variables 
in each clause are distinct, the probability of this last event 
is a k + (1 — a) k . Thus, (^|) follows by inclusion-exclusion. 

Now, since the number of ordered pairs of assignments 
with overlap z is 2 n (™) and since the m = rn clauses are 
drawn independently and with replacement we see that 



E[X 2 } = 2™^] 



f(z/nY 



(4) 



We will bound this sum by focusing on its largest terms. 
The proof of the following lemma, based on standard 
asymptotic techniques, appears in the Appendix. 

Lemma 2 Let F be a real analytic positive function on 
[0, 1] and define g on [0, 1] as 



F(a) 



a a (l-a) 1 - a ' 

where =1. If there exists a max G (0,1) such 
that 5(a max ) = ffmax > g{a) for all a ^ a max , and 
g'' '(a m i«) < 0, then there exist constants B,C > such 
that for all sufficiently large n 



B x g* 



< 



E 



F(z/n) n <Cx gl 



With Lemma |2| in mind we define 



9r(a) 



a a (l - a) 



l-a 



We will prove that 



Lemma 3 For every e > 0, there exists ko = fco (e) such 
that for all k > ko, if 



In 2 1 



r < 2 s - 1 In 2 e 

2 2 

then g r (a) < 5, (1/2) for all a ^ 1/2, andg'^(l/2) < 0. 
Therefore, for all r, k, e as in Lemma [$[ 

E[X 2 ] <Cx (2 flr (l/2)) n , 

where C = C(k) is independent of n. At the same time, 
observe that E[X] 2 = {Ap 2r ) n = (2. 9r (l/2))™. Therefore, 
for all r, k, e as in Lemma |5| 



E\X 2 



< C 



E[X] 2 

which, by Lemma [[J implies 

Pr[X > 0] > 1/C . 

Thus, along with Corollary [j], Lemma |] suffices to establish 
Theorems |l| and |[ 

5. Proof of Lemma |3| 

We wish to show that #"(1/2) < and that g r (a) < 
g r (l/2) for all a ^ 1/2. Since g r is symmetric around 
1/2, we can restrict to a 6 (1/2, 1]. We will divide this 
interval into two parts and handle them with two separate 
lemmata. The first lemma deals with a 6 (1/2,0.9] and 
also establishes that g"(l/2) < 0. 

Lemma 4 Let a G (1/2,0.9]. For all k > 74, if r < 
2 k ~ 1 In 2 then g r {a) < g r {l/2) and g"(l/2) < 0. 

The second lemma deals with a £ (0.9, 1]. 

Lemma 5 Let a £ (0.9, 1]. For every e > and all k > 

k (e), ifr < 2 k - 1 In 2 - ^ - \ - e then g r (a) < g r (l/2). 

Combining Lemmata Q and |]we see that for every e > 0, 
there exists fc = k (e) such that for all k > k if 

i. i In 2 1 

r < 2 fe - 1 ln2 e 

2 2 

then g r (a) < g r {l/2) for all a ^ 1/2 and g'J(l/2) < 0, 
establishing Lemma [| We prove Lemmata ^ and || below. 



The reader should keep in mind that we have made no at- 
tempt to optimize the value of fco in Lemma ||, opting in- 
stead for proof simplicity. 

Proof of Lemma ||. We will first prove that for k > 74, 
g r is strictly decreasing in a = (1/2, 0.9], thus establishing 
g r {ct) < <?r(l/2). Since g r is positive, to do this it suffices 
to prove that (ln g r )' = g' r /g r < in this interval. In fact, 
since g' r (a) = (In g r )' = at a = 1/2, it will suffice to 
prove that for a E [1/2, 0.9] we have (In g, )" < 0. Now, 



(In 9r (a))" 



1 



/» f'(a) 2 
/(a) /(a) 2 J a(l - a) 

/"(«) 1 



/(a) ail- a) 

To show that the r.h.s. of (§) is negative we first note that for 

a > 1/2 and k > 3, 

f"(a) = 2 1 ~ k k(k-l){a k - 2 + {l-a) k -' 2 ) < 2 2 ~ k a k ~ 2 k 2 

is monotonically increasing. Therefore, 

f"{a) < /"(0.9) < 2 2 - k 0.9 k ~ 2 k 2 . 

Moreover, for all a, f(a) > /(1/2) = (1 - 2~ fe ) 2 . 
Therefore, since l/(a(l — a)) > 4 and r < 2 fc ~ 1 In 2, it 
suffices to observe that for all k > 74, 

o2-fc n qfe-2 u2 

( 2fc " lln2 ) x (1-2% - 4< °- 
Finally, recalling that 3' (1/2) = and using 

(ln ffr ) = — — — j 

3(a) g[a.y 

we see that at g"(l/2) < since (lnff r )"(l/2) < 0. 

Proof of Lemma ||. By the definition of g r we see that 

g r {oi) < 3 r (l/2) if and only if 



/(1/2) 



<2a a (l-a) 1 



(6) 



Letting fo(a) = — alna — (1 — a)ln(l — a) denote the 
entropy function, we see that (0) holds as long as 



1 



< 



where 



hx2 -h(a) ln(l + u>) 
f(a)-f (1/2) 



/(1/2) " 

Observe now that for k > 3, / is strictly increasing in 
(1/2, 1], so w > 0. Moreover, for any x > 



> 



1 1 



ln(l + x) ~ x 2 12 



Since /(a)-/(l/2) = 2 1 ~' £ (a A; + (l-a)' £ -2 1 - fc ) < 2 1 - k 
and/(l/2) = (1 - 2 1 ~ fc ) 2 > 1 - 2 2 " fe , we thus see that it 
suffices to have 



>fc-i 



1 2 



l-k 



< 



(7) 



]n2-h(a) " a k + (1 - a) k - 2 1 ~ k 2 12 
Now observe that for any < a < 1 and < q < a k , 

— > l + k(l-a) + q . 

a h — q 

Since a > 1/2 we can set q = 2 1 ~ k — (1 — a) k , yielding 

— h l — h —r > l + k(l-a) + 2 1 - k -{l-a) k . 

a k + (1 - a) k - 2 1 - k v ' y ' 

Since 2 fe (l — a) k < 5 _fc , we find that (|[) holds as long as 

r < <f>(y) — 2 _fc where 

(j)(a) = (ln2-/i(a))^2 fc - 1 + (2^ - 2)fc(l - a) - ^ . 

We are thus left to minimize <fi in (0.9, 1]. Since is 
analytic its minima can only occur at 0.9 or 1, or where 
6' = 0. The derivative of 6 is 



2) x 



-k(ln2-h(a)) 



(8) 



+ (Ina-ln(l-a)) (l + fc(l-a) 



2 k -4 



Note now that for all k > 1 



lim (j>'(a) 



2 k - 1 



ln(l - a) 



is positively infinite. At the same time, 

</>'(0.9) < -0.07 x 2 k k + 1.1 (2 fe - 1) + 0.3 fc 

is negative for k > 16. Therefore, <j> is minimized in the 
interior of (0.9, 1] for all k > 16. Setting <fi' to zero gives 



ln(l - a) 



jfc(ln2-/i(a)) 



l + fc(l-a)+3/(2 fe -4) 



-ma . (9) 



By "bootstrapping" we derive a tightening series of 
lower bounds on the solution for the l.h.s. of for a £ 
(0.9, 1). Note first that we have an easy upper bound, 

-ln(l - a) < fcln2 - In a . (10) 

At the same time, if fc > 2 then 3/(2 fc - 4) < 1, implying 



, , . k(ln2-h(a)) , 
ln ( 1 -") > 2 + H l-a) ~ lnQ 



(11) 



If we write k(l — a) = B then ( |1 1[ ) becomes 
\n2-h{a) ( B 



- ln(l - a) > 



B 



— lri( 



(12) 



By inspection, if B > 3 the r.h.s. of ( jl^ ) is greater than 
the l.h.s. for all a > 0.9, yielding a contradiction. There- 
fore, fc(l - a) < 3 for all fc > 2. Since In 2 - ft(a) > 0.36 
for a > 0.9, we see that for k > 2, (nTJ) implies 



-ln(l - a) > 0.07 k 



(13) 



Observe now that, by (|l3|), fe(l— a) < fc c _0 07 ' c and, hence, 
as increases the denominator of approaches 1. 
To bootstrap, we note that since a>l/2we have 

h(a) < -2(1 - a) ln(l - a) (14) 

< 2c-°- 07fc (fcm2-ln0.9) (15) 

< 2fce- 07fc 



ln(l -a) > 

> 

> 
> 



relies on (^0|),(pjl). Moreover, a > 1/2 implies 
(1 - a) < 2c~ T ™ 7fc . Thus, by using (p]) and 
for all x > 0, (g 

fc(ln2-fc(a)) 



where (||]) 

-lna < 2(1 - a) < 2e -u - UY *. Thus, by 
the fact 1/(1 + x) > 1- a; for all a; > 0, (P) gives for k > 3, 



l + fc(l-a) + 3/(2 fc -4) 
fc(ln2-2fce-°' 07fe ) 

l + 2kc-°- 07k 
/c(ln2-2fce- a07fc )(l-2/cc 
fcln2 - 4fc 2 e~ 



-0.07fc\ 



0.07 fc 



(16) 



For k > 166, 4 k 2 e 



-0.07fc 



< 



1. Thus, by @, we have 
1 — a < 3x2 _fe . This, in turn, implies - In a < 2(1 — a) < 
6 x 2~ k and so, by (ph and ©, we have for a > 0.9 



ft(a) < 6 x 2- fc (feln2-lna) < 5k2~ k . (17) 
to bootstrap again, we get that for 



Plugging (|17|) into 

k > 3 

- ln(l - a) > 

> 



fc(ln2- 5k2~ k ) 
l + 3A:2- fe + 3/(2 fc -4) 
fc(ln2-5fc2- A; ) 
1 + 6/c2- fe 

> fc(ln2- 5fc2-' £ )(l - 6k2- k ) 

> feln2- llfc 2 2" fe . 

Since e 2 < 1 + 2x for x < 1 and life 2 2" fc < 1 for fc > 10, 
we see that for such fc 



1 -a < 2' 



22 k 2 2 



-2A- 



Plugging into (10) the fact —lna < 6x2 we get 



- ln(l -a)<fcln2 + 6x 2" fe . Using that c" T > 1 - x 
for x > 0, we get the closely matching upper bound, 



1 - a > 2~ 



6x2 



-2k 



Thus, we see that for fc > 166, is minimized at an a m ; n 
which is within S of 1 - 2~ fe , where S — 22 fc 2 2~ 2fc . Let 
T be the interval [1 - 2~ fc - (5, 1 - 2~ fc + <5]. Clearly the 
minimum of <fr is at least </>(l — 2 _fc ) — 5 x max aS T 
It is easy to see from that if a G T then |</>'(a)| < 2 fc 2 fe . 

Now, a simple calculation using that ln(l — 2~ fc ) > 
-2~ k - 2- 2k for fc > 1 gives 

2- k ) = i((2 fe - fc)ln2+ (2 fe - l)ln(l - 2- fc )) 



~,2-2k\ 



Therefore, 



x (l + (fc-l)2- fc -fc2 

> 2 fc - 1 ln2- — -l-fc 2 2- fc . 
2 2 



> 2 Hln2-----45fc 3 2- fc 
2 2 



Finally, recall that (ra) holds as long as r < 



-2 



-k 



i.e.. 



r <2 fc - 1 ln2- — -l-46fc 3 2- fc . 
2 2 

Clearly, we can take fco = 0(ln e _1 ) so that for all fc > fco 
the error term 46 fc 3 2~ k is smaller than any e > 0. 

6. Conclusions 

We have shown that the second moment method can be 
used to to determine the random fc-SAT threshold within 
a factor of 2. We also showed that it gives extraordinar- 
ily tight bounds for random NAE fc-SAT, determining the 
threshold for that problem within a small additive constant. 

At this point, it seems vital to understand the following: 

1 . Why does the second moment method perform so well 
for NAE fc-SAT? The symmetry of this problem ex- 
plains why the method gives a non-trivial bound, but 
not why it gives essentially the exact answer. 

2. How can we close the factor of 2 gap for the random 
fc-SAT threshold? Are there other large subsets of sat- 
isfying assignments that are not strongly correlated? 

3. Does the geometry of the set of satisfying assignments 
have any implications for algorithms? Perhaps more 
modestly(?), is there a polynomial-time algorithm that 
succeeds with positive probability for r = ui(k) 2 k /k, 
where w(k) — > oo? What about v(k) = 9(fc)? 
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A. Proof of Lemma [2| 

The idea is that because of the binomial coefficient, the 
sum only has 9(y / n) "significant" terms, each of which is 
of size 6(.9niax/v / "-)- The proof amounts to replacing the 
sum by an integral and then using the Laplace method for 
asymptotic integrals [^] . 

We prove the upper bound first. Recall the following 
form of Stirling's approximation, valid for all n > 0: 



n\ > 
n\ < 

Thus, for any < z < n/2, letting a = z/nwe have 
1 1/1 




an 



< 



/ 2mi ^a{l - a) \a a (1 - a) 1 
1 + l/(6n) 



< 



l + l/(12z) 
1 1 



1 



27m y/aji - a) \a a {1 - a) 1 -* 
and, similarly, for any < z < n we have 
1 1/1 



(18) 



n 
an 



> 



> 



y/a(l - a) \a a {l- a) 1 -* 
1 

X (l + l/(6z))(l + l/(6(n-z)) 
36 1 1/1 



49 V27m y/a(l - a) \a a (1 - a) 



To prove the upper bound, we use (|18|) to write 

n\ „, , s„ . x g(z/n) 



'2nn 



E 

0<z<n 



y/{z/n)(l - z/n) 



+ F(0)™ + F(l) n 



(19) 



Let e = min{a max , 1 - a max }/2 > 0. Let g t < g max be 
the maximum value of g in [0, e] U [1 — e, 1]. Since > 
g(0) = F(0) and > = F(l), using © we get 



E 



< 



< 



i 



'27T71 
1 



F(z/n)' 
(l-e)n 

E 

2— en 

l 



(20) 



g(z/n) r 



/27m ^6(1 - e) 



(l-e)n 

-x £ 5 (z/n)" + n 3 / 2 5 ; 1 



Next, we wish to replace the sum in ( p0[ ) with an inte- 
gral. We first recall that for any integrable function <f> that is 
monotone in [a, b] 



3=0 



(b-a) 



b- 



4>{x) dx 



< max{F(a),F(&)} 



Therefore if <j> has M extrema in [a,b], we can divide [a, b] 
into M + 1 intervals on which (/> is monotone, giving 



,. (b - a) 



En a+ 

3=0 



< (M + 1) x max <£(a:) 



4>(x) dx 



(21) 



Observe now that y n is a strictly increasing function of y in 
[0,oo) implying that g n is extremized at exactly the same 
a G [0, 1] as g. Since g is independent of n and analytic on 
the closed interval [e, 1 — e], it follows that it has at most M 
extrema in [e, 1 — e] for some constant M, and therefore so 
does g n for all n > 0. Finally, since g max > g(a) for all 
a 7^ a max we get that for all sufficiently large n, </" ax > 
n 3/,2 <7™. Thus, using (p]), we can rewrite (20) as 



< 



E 

1 



F(z/n) n 



1 



27rn ^/e(l - e) 

£ 5 (a)"da + (M + 2)^ 



(22) 



To deal with the integral in (|2|) we will use the Laplace 
method for asymptotic integrals. The following lemma can 
be found in || §4.2]: 



Lemma 6 Let hbe a real continuous function. Assume that 
there exist xq and b, c > such that: i) h(x) < h(xo) if 
x 7^ xq, ii) h{x) < h(xo) — b if \x — Xo\ > c, and Hi) 
h"(xo) < 0. If e h ( x ^ dx converges, then for any e > 
and all sufficiently large t, 



2 th ^ dx 



< 



2tt 



(-h"(x )-3e)t 



and there is a similar lower bound for any e < 0. 

To apply this lemma, we set t = n, and take any con- 
tinuous h such that h(x) = \ng(x) for x G (0, 1), and 
such that h(x) goes to — oo as \x\ — > oo sufficiently fast 
so that Q h ( x ) dx converges. Observe that since lny is 
strictly monotone in [0, oo), h is extremized at the same x 
as g. Clearly, condition ii) of Lemma ^ is also satisfied and 
since [In (7(2;)]" = g"(x)/g(x) — (g'(x)/g(x)) 2 , we see 
that h"(a max ) = g" (a max ) / g(a max ) < 0. Therefore, for 
all sufficiently large n 



E(/;i 

z=0 



< 



1 



27m y/e(l - e) 




2itg 



max „ 



-g"(u max ) n 



In order to prove the lower bound, again we take e = 
min{a max , 1 — a max }/2 > 0, and discard all the terms of 
the sum for which a ^ [e, 1 — e]. Since l/^/a(l — a) > 2, 
we have 



z=0 



(l-e)r. 



E f w»r > E 



F(z/n) r 



^ 36 1 v — ^ , , . _ 
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Replacing this sum by an integral as before and using the 
lower bound of Lemma || gives 



> 




2irg n 



-5"(a max )n 



B x g 



